44 research outputs found

    Beyond Trending Topics: Real-World Event Identification on Twitter

    Get PDF
    User-contributed messages on social media sites such as Twitter have emerged as powerful, real-time means of information sharing on the Web. These short messages tend to reflect a variety of events in real time, earlier than other social media sites such as Flickr or YouTube, making Twitter particularly well suited as a source of real-time event content. In this paper, we explore approaches for analyzing the stream of Twitter messages to distinguish between messages about real-world events and non-event messages. Our approach relies on a rich family of aggregate statistics of topically similar message clusters, including temporal, social, topical, and Twitter-centric features. Our large-scale experiments over millions of Twitter messages show the effectiveness of our approach for surfacing real-world event content on Twitter

    SONAR: Automatic detection of cyber security events over the twitter stream

    Get PDF
    © 2017 ACM. Everyday, security- experts face a grim ing number of security events that affecting people well-being, their information systems and sometimes the critical infrastructure. The sooner they can detect and understand these threats, the more they can mitigate and forensically investigate them Therefore, they need to have a situation awareness of the existing security events and their possible effects. However, given the large number of events, it can be difficult for security analysts and researchers to handle this flow of information in an adequate manner and answer the following questions in near- real time: what are the current security events? How long do they last? In this paper, we will try to answer these issues by leveraging social networks that contain a massive amount of valuable information on many topics. I lowever. because of the very- high volume, extracting meaningful information can be challenging. For this reason, we propose SONAR: An automatic, self-learned framework that can detect geolocate and categorize cyber security events in near-real time over the Twitter stream. SONAR is based on a taxonomy- of cyber security events and a set of seed keywords describing type of events that we want to follow in order to start detecting events. Using these seed keywords, it automatically discovers new relevant keywords such as malware names to enhance the range of detection while staying in the same domain. Using a custom taxonomy describing all type of cyber threats, we demonstrate the capabilities of SONAR on a dataset of approximately 47.8 million tweets related to cyber security in the last 9 months. SONAR could efficiently and effectively detect, categorize and monitor cyber security related events before getting on the security news, and it could automatically discover new security terminologies with their event. Additionally. SONAR is highly scalable and customizable by design; therefore we could adapt SONAR framework for virtually any type of events that experts are interested in

    Crowdsourcing Cybersecurity: Cyber Attack Detection using Social Media

    Full text link
    Social media is often viewed as a sensor into various societal events such as disease outbreaks, protests, and elections. We describe the use of social media as a crowdsourced sensor to gain insight into ongoing cyber-attacks. Our approach detects a broad range of cyber-attacks (e.g., distributed denial of service (DDOS) attacks, data breaches, and account hijacking) in an unsupervised manner using just a limited fixed set of seed event triggers. A new query expansion strategy based on convolutional kernels and dependency parses helps model reporting structure and aids in identifying key event characteristics. Through a large-scale analysis over Twitter, we demonstrate that our approach consistently identifies and encodes events, outperforming existing methods.Comment: 13 single column pages, 5 figures, submitted to KDD 201

    Addition of elotuzumab to lenalidomide and dexamethasone for patients with newly diagnosed, transplantation ineligible multiple myeloma (ELOQUENT-1): an open-label, multicentre, randomised, phase 3 trial

    Get PDF

    Real-time Ranking with Concept Drift Using Expert Advice

    No full text
    In many practical applications, one is interested in generating a ranked list of items using information mined from continuous streams of data. For example, in the context of computer networks, one might want to generate lists of nodes ranked according to their susceptibility to attack. In addition, real-world data streams often exhibit concept drift, making the learning task even more challenging. We present an online learning approach to ranking with concept drift, using weighted majority techniques. By continuously modeling different snapshots of the data and tuning our measure of belief in these models over time, we capture changes in the underlying concept and adapt our predictions accordingly. We measure the performance of our algorithm on real electricity data as well as a synthetic data stream, and demonstrate that our approach to ranking from stream data outperforms previously known batch-learning methods and other online methods that do not account for concept drift

    Identification and Characterization of Events in Social Media

    No full text
    Millions of users share their experiences, thoughts, and interests online, through social media sites (e.g., Twitter, Flickr, YouTube). As a result, these sites host a substantial number of user-contributed documents (e.g., textual messages, photographs, videos) for a wide variety of events (e.g., concerts, political demonstrations, earthquakes). In this dissertation, we present techniques for leveraging the wealth of available social media documents to identify and characterize events of different types and scale. By automatically identifying and characterizing events and their associated user-contributed social media documents, we can ultimately offer substantial improvements in browsing and search quality for event content. To understand the types of events that exist in social media, we first characterize a large set of events using their associated social media documents. Specifically, we develop a taxonomy of events in social media, identify important dimensions along which they can be categorized, and determine the key distinguishing features that can be derived from their associated documents. We quantitatively examine the computed features for different categories of events, and establish that significant differences can be detected across categories. Importantly, we observe differences between events and other non-event content that exists in social media. We use these observations to inform our event identification techniques. To identify events in social media, we follow two possible scenarios. In one scenario, we do not have any information about the events that are reflected in the data. In this scenario, we use an online clustering framework to identify these unknown events and their associated social media documents. To distinguish between event and non-event content, we develop event classification techniques that rely on a rich family of aggregate cluster statistics, including temporal, social, topical, and platform-centric characteristics. In addition, to tailor the clustering framework to the social media domain, we develop similarity metric learning techniques for social media documents, exploiting the variety of document context features, both textual and non-textual. In our alternative event identification scenario, the events of interest are known, through user-contributed event aggregation platforms (e.g., Last.fm events, EventBrite, Facebook events). In this scenario, we can identify social media documents for the known events by exploiting known event features, such as the event title, venue, and time. While this event information is generally helpful and easy to collect, it is often noisy and ambiguous. To address this challenge, we develop query formulation strategies for retrieving event content on different social media sites. Specifically, we propose a two-step query formulation approach, with a first step that uses highly specific queries aimed at achieving high-precision results, and a second step that builds on these high-precision results, using term extraction and frequency analysis, with the goal of improving recall. Importantly, we demonstrate how event-related documents from one social media site can be used to enhance the identification of documents for the event on another social media site, thus contributing to the diversity of information that we identify. The number of social media documents that our techniques identify for each event is potentially large. To avoid overwhelming users with unmanageable volumes of event information, we design techniques for selecting a subset of documents from the total number of documents that we identify for each event. Specifically, we aim to select high-quality, relevant documents that reflect useful event information. For this content selection task, we experiment with several centrality-based techniques that consider the similarity of each event-related document to the central theme of its associated event and to other social media documents that correspond to the same event. We then evaluate both the relative and overall user satisfaction with the selected social media documents for each event. The existing tools to find and organize social media event content are extremely limited. This dissertation presents robust ways to organize and filter this noisy but powerful event information. With our event identification, characterization, and content selection techniques, we provide new opportunities for exploring and interacting with a diverse set of social media documents that reflect timely and revealing event content. Overall, the work presented in this dissertation provides an essential methodology for organizing social media documents that reflect event information, towards improved browsing and search for social media event data
    corecore